Fault-tolerant Distributed Systems

نویسنده

  • Dmitrii Zagorodnov
چکیده

Distributed systems have two key advantages: they increase the quality of our interaction with computers and the quantity of resources available to us. The quality increases as components of a distributed system hide failures from users and render a service more reliable (safer from loss) andmore available (less downtime). Increases in quantity, such as computational power and storage capacity, enable users to overcome fundamental limits of scalability of a single node and solve problems larger in size and solve them faster. Although these two advantages are obvious in principle, reaping the benefits is often hard in practice. I have done research in both of these directions: most recently on fault-tolerant distributed systems and in the past on high-performance parallel systems. All projects that I have worked on share a unifying thread: they enable novel distributed solutions on existing platforms—in a sense, working around the constraints of the deployed infrastructure. For example, instead of inventing a new transport protocol, I have built a system that hides server failures by relying on standard TCP semantics. The strength of this type of research is that it makes no assumptions about future technological trends, but works with what already exists. I believe that accommodating entrenched standards and technologies is crucial for any system’s viability and vitality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Modeling and Model Checking Fault-Tolerant Distributed Algorithms

Fault-tolerant distributed algorithms are central for building reliable, spatially distributed systems. In order to ensure that these algorithms actually make systems more reliable, we must ensure that these algorithms are actually correct. Unfortunately, model checking state-ofthe-art fault-tolerant distributed algorithms (such as Paxos) is currently out of reach except for very small systems....

متن کامل

Voting Algorithm Based on Adaptive Neuro Fuzzy Inference System for Fault Tolerant Systems

some applications are critical and must designed Fault Tolerant System. Usually Voting Algorithm is one of the principle elements of a Fault Tolerant System. Two kinds of voting algorithm are used in most applications, they are majority voting algorithm and weighted average algorithm these algorithms have some problems. Majority confronts with the problem of threshold limits and voter of weight...

متن کامل

Channel Reiication: a Reeective Approach to Fault-tolerant Software Development

Reeective systems can be used to ease the implementation of fault tolerance mechanisms in distributed applications as show in Anc95, Fab94]. In this paper we introduce a new model for reeective computations, and we show how it can be used for building up fault tolerant applications.

متن کامل

Voting Algorithm Based on Adaptive Neuro Fuzzy Inference System for Fault Tolerant Systems

some applications are critical and must designed Fault Tolerant System. Usually Voting Algorithm is one of the principle elements of a Fault Tolerant System. Two kinds of voting algorithm are used in most applications, they are majority voting algorithm and weighted average algorithm these algorithms have some problems. Majority confronts with the problem of threshold limits and voter of weight...

متن کامل

Fault Tolerant Leader Election in Distributed Systems

There are many distributed systems which use a leader in their logic. When such systems need to be fault tolerant and the current leader suffers a technical problem, it is necesary to apply a special algorithm in order to choose a new leader. In this paper I present a new fault tolerant algorithm which elects a new leader based on a random roulette wheel selection.

متن کامل

Real-time Fault-tolerant Scheduling in Heterogeneous Distributed Systems

∗ This work was supported by National Defense Pre-research Foundation of China. Abstract: Some works have been done in addressing real-time fault-tolerant scheduling algorithms. However, they all based on homogeneous distributed systems or multiprocessor systems, which have identical processors. This paper presents two fault-tolerant scheduling algorithms, RTFTNO and RTFTRC, for periodic real-t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004